-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the default Gunicorn API server workers count to one #1454
Conversation
container = BentoMLContainer() | ||
config = BentoMLConfiguration() | ||
if "BENTOML_GUNICORN_TIMEOUT" in os.environ: | ||
config.override(["api_server", "port"], int(os.environ.get("BENTOML_GUNICORN_TIMEOUT"))) | ||
if "BENTOML_GUNICORN_NUM_OF_WORKERS" in os.environ: | ||
config.override(["api_server", "workers"], int(os.environ.get("BENTOML_GUNICORN_NUM_OF_WORKERS"))) | ||
container.config.from_dict(config.as_dict()) | ||
|
||
from bentoml import service, yatai | ||
|
||
container.wire(packages=[service, yatai]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a good way right now. Deployment tests (e2e_tests) are running offline right now.
We need to refactor/improve the sagemaker deployment with better gunicorn supports anyway. I am fine that we continue to do the basic testings on the e2e test
Codecov Report
@@ Coverage Diff @@
## master #1454 +/- ##
==========================================
+ Coverage 67.54% 67.75% +0.21%
==========================================
Files 149 149
Lines 10031 10008 -23
==========================================
+ Hits 6775 6781 +6
+ Misses 3256 3227 -29
Continue to review full report at Codecov.
|
container = BentoMLContainer() | ||
config = BentoMLConfiguration() | ||
if "BENTOML_GUNICORN_TIMEOUT" in os.environ: | ||
config.override(["api_server", "port"], int(os.environ.get("BENTOML_GUNICORN_TIMEOUT"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a typo? s/port/timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This was a typo. We should think about adding unit test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@inject | ||
def _serve( | ||
bento_server_timeout: int = Provide[BentoMLContainer.config.api_server.timeout], | ||
bento_server_workers: int = Provide[BentoMLContainer.api_server_workers], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also remove the CPU core related calculation in BentoMLContainer.api_server_workers
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We we still want to maintain the behavior to automatically determine works if workers is set to None
?
Thanks, @ssheng, I'm merging this PR now, feel free to address the issue we just discussed in a follow-up PR. |
…1454) * Update the default API server GUnicorn workers to 1. * Update dependency injection wire functionality for yatai/deployment/sagemaker/serve. * Update wire modules in yatai/deployment/sagemaker/serve. * Use app_server_workers in yatai/deployment/sagemaker/serve. * Fix typo.
Description
Update the default API GUnicorn server workers count to one. Originally, the workers count was automatically determined by the number of available CPU cores.
Motivation and Context
The automatic determination was not ideal because users were often unaware the large number of workers and large number of workers could result in additional memory overhead from models. Users will still have the ability to set BentoML to automatically determine the number of workers by explicit configuration.
How Has This Been Tested?
./dev/format.sh
./dev/lint.sh
./ci/unit_test.sh
Types of changes
Component(s) if applicable
Checklist:
./dev/format.sh
and./dev/lint.sh
script have passed(instructions).